Term-based Identification of Sentences for Text Summarisation

نویسندگان

  • Byron Georgantopoulos
  • Stelios Piperidis
چکیده

The present paper describes a methodology for automatic text summarisation of Greek texts which combines terminology extraction and sentence spotting. Since generating abstracts has proven a hard NLP task of questionable effectiveness, the paper focuses on the production of a special kind of abstracts, called extracts: sets of sentences taken from the original text. These sentences are selected on the basis of the amount of information they carry about the subject content. The proposed, corpus-based and statistical approach exploits several heuristics to determine the summary-worthiness of sentences. It actually uses statistical occurrences of terms (TF· IDF formula) and several cue phrases to calculate sentence weights and then extract the top scoring sentences which form the extract.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Biographical Event Extraction Using Wikipedia Traffic

Biographical summarisation can provide succinct and meaningful answers to the question “Who is X?”. Current supervised summarisation approaches extract sentences from documents using features from textual context. In this paper, we explore a novel approach to biographical summarisation, by extracting important sentences from an entity’s Wikipedia page based on internet traffic to the page over ...

متن کامل

The influence of personal pronouns for automatic summarisation of scientific articles

In automatic summarisation, statistical methods based on tokens’ frequency are commonly used in combination with other methods or on their own to extract important sentences from a text. Quite often researchers justify the relatively poor performance of these statistical methods by the fact that they do not consider the anaphoric relations between words. In this paper, we perform a comprehensiv...

متن کامل

Impact of Citing Papers for Summarisation of Clinical Documents

In this paper we show that information from citing papers can help perform extractive summarisation of medical publications, especially when the amount of text available for development is limited. We used the data of the TAC 2014 biomedical summarisation task. We report several methods to find the reference paper sentences that best match the citation text from the citing papers (“citances”). ...

متن کامل

A Rhetorical Status Classifier For Legal Text Summarisation

We describe a classifier which determines the rhetorical status of sentences in texts from a corpus of judgments of the UK House of Lords. Our summarisation system is based on the work of Teufel and Moens where sentences are classified for rhetorical status to aid sentence selection. We experiment with a variety of linguistic features with results comparable to Teufel and Moens, thereby demonst...

متن کامل

Automatic Annotation of Corpora for Text Summarisation: A Comparative Study

This paper presents two methods which automatically produce annotated corpora for text summarisation on the basis of human produced abstracts. Both methods identify a set of sentences from the document which conveys the information in the human produced abstract best. The first method relies on a greedy algorithm, whilst the second one uses a genetic algorithm. The methods allow to specify the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000